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Abstract —Existing studies have extensively used temporal- 
spatial data to mining the mobility patterns of different kinds 
of travelers. Smart Card Data (SCD) collected by the Automated 
Fare Collection (AFC) systems can reflect a general view of 
the mobility pattern of the whole bus and metro riders in 
urban area. Since the mobility and stability are temporally 
and spatially dynamic and therefore difficult to measure, few 
work focuses on the transition of their travel pattern between 
a long time interval. In this paper, an overview of the relation 
between stability and regularity of public transit riders based 
on SCD of Beijing is presented first. To analyze the temporal 
travel pattern of urban residents, travelers are classified into 
two categories, extreme and non-extreme travelers. We have 
two lines for profiling all cardholders, rule-based approach for 
extreme and improved density-based clustering method for non¬ 
extreme. Similar clusters are aggregated according their features 
of regularity and occasionality. By combining transition matrix 
of passenger’s temporal travel pattern and socioeconomic data 
of Beijing in the year of 2010 and 2014, several analyses about 
resident’s temporal mobility and stability are presented to shed 
lights on the interdependence between stability and mobility 
in the time dimension. The results indicate that passengers’ 
regularity is hard to predict, extreme travel patterns are more 
vulnerable and overall non-extreme travel patterns nearly stay 
the same. 

I. Introduction 

The continuum of human spatial immobility-mobility at 
varying geographic and temporal scales poses fascinating 
topics and challenges for researchers to make right decisions 
on urban development. Stability and mobility are relative 
and linked, since mobility reflects movement in short-term 
temporal or small spatial scales, while stability refer to long¬ 
term. Geographically, people move over scales ranging from a 
few meters to hundreds of kilometers; temporally, they move 
or stay over scales ranging from a few minutes to many years. 
Although people’s movement seems to be disordered, we can 
still mining useful patterns for both individuals and a group 
of residents from various types of data. However, due to the 
lack of data, research work on mobility and stability is still 
seldom carried out. Thus, figuring out the puzzle of the relation 
between stability and mobility will be very meaningful and 
can help uncover different aspects of public transit, social and 
urban dynamics. 

The temporal and spatial dynamic mobility pattern of 
residents has been concerned about for a long time by re¬ 
searchers in the fields of transportation engineering ifTTll . com- 
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puter science ||20|, urban planning 0, or even socioeconomics 
P4|. Along with the development of computer science and 
geographic information (CIS), many new technologies and new 
types of data can be utilized to measure people’s mobility 
pattern in large-scale regions, such as Call Detail Records of 
mobile phone Q, taxicabs’ GPS information ifTHl . or even 
outdoor Wi-Ei signal data. When comes to a city-wide mobility 
analysis, smart card data (SCD) collected by Automated Eare 
Collection (AEC) systems may be a better choice, since AEC 
system are widely adopted by public transportation operators 
in most metropolitan areas O. 

AEC systems based on contactless smart cards are available 
for both city buses and metros to record the details of transac¬ 
tion information when passengers boarding or alighting. SCD 
contains fine-grained information not only about passengers’ 
ID (smart cards’ ID) and locations of boarding or alighting 
stations, but also transaction time and bus/metro lines. It is a 
great convenience to utilize SCD to depict passengers’ daily, 
weekly or yearly travel profiles in large-scale regions covered 
by public transit systems. Erom an individual perspective, SCD 
can help record passenger’s transit network, reflect his social 
and economic characteristics, and even forecast his travel pat¬ 
tern. Erom a city perspective, SCD acting as an transportation 
probe can help estimate transportation conditions and provide 
new materials for urban planning policy. 

In this paper, we utilize the temporal information of SCD 
to mine the relationship between passenger’s mobility and 
stability in different time and frequency scales. To better 
understand the passenger behavior in public transportation, 
we introduce other socioeconomic data into our analysis. Our 
contributions can be described as follows: 

• We take passenger’s regularity into account to the 
analyze relation between regularity and stability. 

• We profile passengers with a rule-based classifica¬ 
tion approach for extreme travelers and an improved 
density-based clustering method for non-extreme trav¬ 
elers. 

• We analyze the mobility and stability of extreme and 
non-extreme travelers in different group granularities 
by combining socioeconomic data. 

The organization of this paper is as follows: Related work 
is briefly discussed in section 2. In section 3, we discuss 
the relation between regularity and stability and profile the 


passengers. Our analysis about mobility and stability is present 
in section 4. Section 5 concludes the paper with a summary 
and a short discussion of future research. 

II. Related Work 

Hanson ID is among the first researchers to focus on 
stability and show analyzing individuals’ stability requires 
also analyzing their mobility. Through an empirical example 
centered on the relationship between entrepreneurship and 
place, he propose explicitly considering locational stability 
requires examining stability and mobility in tandem, since 
spatiotemporal dynamics involved. Based on this idea, James et 
al. El concentrate on detailed substructures and spatiotemporal 
flows of mobility to show that individual mobility is domi¬ 
nated by small groups of frequently visited, dynamically close 
locations, forming primary ’’habitats” capturing typical daily 
activity. While many other works CD, ca, ca, cii choose a 
perspective on large-scale mobility about urban human beings, 
vehicles or taxis. 

To measure residents’ stability and mobility in urban area, 
SCD in public transit is one of the most widely used data. 
According to Long et al. oni, SCD related research topics can 
be classified as: 1) data processing and data complementation, 
like back-calculation of origin and destination and recognition 
of trip purpose; 2) supporting and management of public transit 
systems; 3) place-based urban spatial structure and 4) person- 
based analysis on social network and special group of people. 
Pelletier et al. CD also give a literature review of SCD use 
in public transit and present three levels of management and 
usage of SCD: strategic (long-term planning), tactical (service 
adjustments and network development), and operational (rid- 
ership statistics and performance indicators). Zheng et al. ll^ 
show us several typical applications based on SCD, like build¬ 
ing more accurate route planners. While, Long et al. 0 seek to 
understand extreme public transit riders in Beijing using both 
traditional household surveys and SCD. In their work, public 
transit riders are classified into four groups of different types 
of extreme transit behaviors to identify the spatiotemporal 
patterns of these four extreme transit behaviors. Further, Neal 
et al. (H discuss personalizing transport information services 
based on SCD. Among their contributions, the authors use 
clustering to prove that the usage of public transportation can 
vary considerably between individuals. Each passenger’s trips 
are aggregated into a weekday profile describing his temporal 
habits and hierarchical agglomerative clustering is introduced 
to discover groups of passengers characterizing different travel 
habits. Contrary to this approach, our weekly profile, presented 
in Section 3, consisting of hour-grained grid can show more 
details. 

As we investigated, many methods and algorithms are 
adopted to process and analyze SCD. To clustering the tem¬ 
poral information, Mahrsi et al. O construct temporal pas¬ 
senger profiles based on boarding information and apply a 
generative model-based clustering approach to discover clus¬ 
ters of passengers. They also assign passengers based on 
their boarding information to ’’residential” areas, which they 
established through a clustering of socioeconomic data of the 
Rennes, France, to inspect how socioeconomic characteristics 
are distributed over the passenger temporal clusters. A density- 
based clustering method, DBSCAN O, which is very similar 



Hour 

Eig. 1. Weekly profiles of two passengers’ transaction time. The transaction 
time (colored squares) reflects their different travel pattern. Numbers in squares 
represent times of transaction in that hour. D1-D5: weekdays, D6: Saturday, 
D7: Sunday. 

to OPTICS □ is used by CD . The authors identify trip chains 
to detect transit riders’ historical travel patterns and apply 
K-Means-f-f clustering algorithm and the rough-set theory to 
cluster and classify travel pattern regularities. Comparing to 
approaches reported in these works, we improve the OPTICS 
algorithm to cut down input parameters and control cluster 
size. Further, other than focusing on people’s mobility pattern, 
we utilize SCD to measure the interdependence between 
stability and mobility in the time dimension. 

III. Proeiling Passengers 
A. Dataset Description 

The Smart Card Data (SCD) collected and issued by 
Beijing Transit Incorporated contains transit riders’ records 
for both the bus and metro systems. There were two types 
of Automatic Fare Collection (AFC) system on Beijing buses: 
fiat fares and distance-based fares, until the beginning of 2015, 
since when all bus lines became distance-based fare system. It 
is a design fiaw for the bus smart card system that fiat fares 
system records the transaction (paying) time when checking- 
in, whereas distance-based fares system records the transaction 
time when checking-out. For Beijing metro system, although 
passengers pay the fare when alighting, the system records the 
time of both checking-in and checking-out. In this paper, to 
offset the design fiaw, we consider the transaction time as the 
time for one ride. 

We select SCD with shared card IDs from two datasets 
in 2010 and 2014. Both the selected datasets of 2010 and 
2014 last for one week and contain the same smart card 
IDs with the amount of 1.9 million, representing 1.9 million 
passengers lived in Beijing at least from 2010 to 2014. We 
assume each smart card represents an anonymous passenger, 
without considering the situation of passengers’ changing card, 
which is not common in Beijing. Each record of the SCD 
consists of 1) smart card ID, 2) boarding or alighting time, and 
3) station ID of boarding or alighting line. As the time span of 
SCD in 2010 and 2014 both cover one week, we estimate each 
passenger’s trip activities using a ’’weekly profile”, a vector 





















































































































































(a) Plot of REIO - RE14 (b) Plot of REIO - Stability 

Fig. 2. Relationship between regularity and stability 

contains 168 (7x24) variables describing the distribution of 
the trip activities. Each variable in the vector represents the 
number of smart card’s transaction time over each hour in 
each day of the week. Figure illustrates weekly profiles of 
passengers’ transaction time. 

B. Regularity and Stability 

In this section, we aim to figure out the relation between 
passengers’ regularity and stability in their daily travel. The 
large amount of SCD in 2010 and 2014 can help us under¬ 
stand each passenger’s weekly travel regularity. We take three 
aspects of weekly regularity into consideration: 

• Travel frequency of the week, W = ^ G [y, 1]. Here, 
d is the number of days when passengers travelled by 
public transit. 

• Travel frequency of every day. We count the number of 
trips in each day of the week, D = {Di\i = 1,..., 7}. 
The standard deviation of D is calculated as Dgd- 

• Temporal differences between daily trips. We acquire 

the temporal differences of n daily trips in one week, 
DIST = {Disti\i = 1,..., by using 

the dis tance calculating method presented in Section 
|III-D1| DISTsd is the standard deviation of DIST. 

Then, since Dsd and DISTsd is negative correlated with 
regularity, we defined passenger’s regularity (RE) as: 

RE = W X X e-^^^^^^,RE e (0,1] (1) 

Here, we use the exponential function (e~^^^) to normalize the 
RE, ranging from 0 to 1. We also acquire each passenger’s 
stability (Sta), which subject to the variance between each 
passenger’s regularities in 2010 and 2014 (REIO and REIA), 
Sta = RE14/RE10. Figure [^a) shows the relation between 
REIO and REIA, and the correlation coefficient is 0.0485. 
Figure [^(b) shows the relation between REIO and Sta, and 
the correlation coefficient is -0.00059. This two coefficients are 
both less than 0.1, which means the regularities of passengers 
between 4 years are nearly irrelevant. We may assert that the 
regularity between long-time intervals cannot be predicted. 

C. Identifying Extreme Travelers 

Before analyzing the transit behaviors of the passengers in 
Beijing, we separate the whole passengers into two groups: 
extreme travelers and non-extreme travelers. We define and 
identify the extreme travelers according to a survey in 2010 
0 as well as researchers’ own experiences of living in 
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Fig. 3. Distance between two vectors, u and v, is the sum of the time 
interval (T^ = min{Tf,TT}) and absolute difference between Ui and Vi 
(Ai) in each position i, where Ui f Vi. 

Beijing. Four types of extreme travelers are defined based on 
their behaviors in weekdays, by setting several thresholds and 
combining empirical knowledge of Beijing as depicted in Table 
[T| For example, since most people’s working hours start on 8:30 
or 9:00 am in Beijing, public transit boarding time before 6:00 
am would be considered as an unusually early situation. 


TABLE I. Definitions of extreme travelers 


Type 

Defination 

Early Birds 
(EBs) 

Eirst trip < 6AM, more than two days 
in five weekdays (60% of weekdays) 

Night Owls 
(NOs) 

Last trip > 10PM, more than two days 
in five weekdays (60% weekdays) 

Tireless 
Itinerants (TIs) 

> one and a half hours commuting, 
more than two days in a week 

Recurring 
Itinerants (RIs) 

> 30 trips in weekdays of a week 
(> 6 trips per day) 


D. Clustering Non-extreme Travelers 

As extreme travelers only account for a small propor¬ 
tion (less than 5%), we cluster the non-extreme travelers to 
character their travel pattern. This process is consisted of 
three stages: 1) defining the distance (similarity) between 
different SCD records; 2) clustering samples of SCD with a 
simplified-smoothed OPTICS algorithm proposed by us; and 
3) classifying the whole SCD records with a Kmeans-like 
algorithm according to results of the clustering stage. 

1) Defining Distance between Smart Card Records: We 
count the transaction time of SCD in each hour of the week 
to form a vector consisted of 168 (24 hours x 7 days) 
variables, V = ^ N. There are some classic 

distance-measurement methods to measure the similarity of 
different records, like Euclidean distance, Manhattan distance, 
and cosine distance. As we tested, given two vectors. Euclidean 
distance and Manhattan distance only compute the sum of 
differences between components in the same position of two 
vectors. But they will not consider the infiuence of com¬ 
ponents’ positions which reflect vectors’ temporal attribute. 
When computing cosine distance, since the vectors are mostly 
sparse, the product of two components, one is zero and the 
other is not, in the same position of two vectors will be 
zero. This will miss out many useful information of the two 
vectors. Thus, those classic distances formulas are not capable 
of measuring the time interval between smart card transactions. 

To solve the above problems, we define a method to 
compute the distance between two vectors as Transaction 
Distance (Dtran), considering both vector’s difference and 
temporal attribute. Since non-extreme passengers’ vectors are 
mostly sparse vectors. We define the distance between the two 








































Fig. 4. cd of OPTICS and rd of both OPTICS and SS-OPTICS. MinPts = 4 


vectors, u and v, by computing the sum of ith component 
distance (Di) between Ui and Vi. The component distance (Di) 
consists of two parts, the time interval (T^) and the absolute 
difference of the two components’ value (Ai = \ui — Vi\). As 
for the time interval T^, if one of Ui and Vi equals to 0, Ti 
equals the smaller value of the previous and the next time 
intervals between non-zero components in different vectors, 
namely Ti = min{Tf ,7;^}. If Ui and Vi both do not equal 
to 0, = 0. Then, the Transaction Distance between vectors 

u and V can be represented as: 

167 

Dtran = ^ min{Tf, Tj^] + k*\ui- Vi\, s.t. Ui ^ Vi 
i=0 

( 2 ) 

Here, k, ranging from 0 to 3, is a parameter to balance the 
weights of T and A, as we tested. Figure shows an example 
of computing the transaction distance. Tj = min{T^^T^} 
equals 1 and T/ = 0. If a non-zero component in one vector 
cannot find a previous or next non-zero component in the other 
vector, like the situation of Ui, its Tf equals min{i, 167 — i}. 

2) Clustering Samples of SCD Records: We then cluster 
the vectors to identify the travel pattern of public transit riders 
in Beijing based on the distances of smart card transaction 
records. Although K-Means algorithms or other centroid mod¬ 
els are very efficient to cluster the travelers pattern, it is hard 
to nominated the number of clusters (k) before running of 
the algorithm, without prior knowledge. A new fast-searching 
and density-based clustering algorithm ca can only identify 
4 or 5 obvious clusters as we tested. Thus, we propose an 
improved density-based clustering algorithm based on OPTICS 
m, which is suitable for our data with distances acquired. We 
named it as Simplified-Smoothed-OPTICS (SS-OPTICS). 

1) Simplify: The original OPTICS algorithm has two key 
concepts, cord-distance and reachability-distance. 

Definition-1, Core-distance (cd): Let p be an object from 
a dataset D, let 5 be a distance value, let Ns{p) be the set 
{x G D\dist{p,x) < e}, let MinPts be a natural number and 
let MinPts-distance(p) be the distance from p to its MinPts 
neighbor. Then, the core-distance of p is defined as core- 
distances,MinPts(p) = 

UNDEFINED , if Card{N^{p)) < MinPts 
MinPts-distance{p) , otherwise 

Definition-2, Reachability-distance (rd): Let p, o G D, let 
N^{o) be the 5 -neighborhood of o, let MinPts be a natural 
number. Then, the reachability-distance of p with respect to o 


Algorithm 1: Getting Ordered Points by OPTICS 
Data: D (Unprocessed Dataset), 5 
Result: OrderedPoints 
initialization; 
while D 7 ^ Null do 
Point = D.popQ; 

OrderedPoints .append{Point ); 

Pjneighhors = point.neighbor (e)’, 
if Pjneighhors Null then 
Order Seeds = []; 

OrderSeeds.updateRD{Pointy Pjneighhors)', 

while OrderSeeds do 

OrderSeeds.sort{key = RD)', 

Seed = Order Seed s.pop()', 
OrderedPoints.append{Seed ); 

Sjneighbors = Seed.neighhor{e)', 
if Sjneighbors Null then 
|_ OrderSeeds.updateRD{Seed^ Sjneighbors) 


is defined as reachability-distanees,MinPts(p,o) = 

UNDEFINED , if \Ne{o)\ < MinPts 
max {core-distance(o), distance{o, p)) , otherwise 

Here, 5 and MinPts are two input parameters of the original 
OPTICS algorithm. According to OPTICS’s definitions, the 
green points covered by the yellow circle in the Figure]^ share 
the same reachability-distance {rd), which equals to the core¬ 
distance of point o {cd). Although the green points, pi, p 2 and 
Ps , have same rd, their actual reachable distances from point 
o are different {rd^^ < rd^^ < rd^f). 

The main ideas of OPTICS can be described as: 1) reach¬ 
ability distance represents density and 2) reachability-distance 
determines the points’ output order, which determines clusters. 
Based on these ideas, we can find a design fiaw of OPTICS that 
the output order of pi, p 2 and ps in the left example of Figure 
1^ maybe disordered due to their same rJs. Thus, we design an 
improved OPTICS algorithm, mainly shown in Algorithmic 
by abandoning the concept of core-distance and define a new 
concept of reachability-distance (RD) as follows. 

Definition-3, New Reachability-distance (RD): Let p, o e 

D, let N^{o) be the 5 -neighborhood of o. The reachability- 
distance of p with respect to o is defined as reachability- 
distanceRp,o) = 

UNDEFINED , if \Ne{o)\=t) 

distance{o, p) s.t. p G Ns{o) , otherwise 

2) Smooth: The 2D plot based on the ordered points’ 
reachability distance can help us distinguish the clusters. As 
the denser the points gather, the lower reachability-distances 
the points get, the ’’valley” shapes in the reachability distance 
curve represent clusters with high density. In Figure the 
blue and green lines are the rd curves of OPTICS and SS- 
OPTICS, respectively. We notice that, although the value of 
SS-OPTICS’s RD is less than OPTICS’s, their curves are 
extremely similar. 













Fig. 5. RD curves of OPTICS and SS-OPTICS, £ = 100 and S = 41 


The red line is the smoothed RD of SS-OPTICS, RD\ in 
Figure We smooth the RD curve with two aims: 1) easily 
identifying the the valley-shaped clusters and 2 ) controlling 
the size of a cluster. We use mean filter to smooth the RD 
curve to achieve our goals with only one parameter, window 
size (S). Each value of the smoothed RD curve, RD-, is the 
mean of RD value of points within the window: 



Fig. 6. Four obvious categories of the heatmap of the 33 clusters. D1-D5: 
weekdays, D6-D7: weekends 


j=i+n ^ . 

RD- = {Y^ RDj)/S, s.t. n = (3) 

j=i-n 

Since RD has been filtered by a S sized window, it should 
be noticed that the boundary of the valley-shaped cluster has a 
bias to the left, and the offset is . After the mean filtering, 
the valley (cluster) of the RD curve, whose number of the 
points in this cluster is less than will nearly be filled up. 
Thus, the cluster size is controlled to be larger than 

As we tested, the average sizes of clusters generated by SS- 
OPTICS is 10% larger than that of OPTICS and the average 
cohesion of clusters generated by SS-OPTICS is around 3% 
smaller than that of OPTICS. Further, the results of clustering 
by the two methods are nearly the same, if the input points are 
distributed in normal shapes, like square, circle or Gaussian. 
And both SS-OPTICS and OPTICS are not sensitive to the 
value of input parameter with time complexity of O(n^). But 
SS-OPTICS only needs one parameter (e, setted as 100 in 
our experiment), while OPTICS needs two (e and MinPts). 
Meanwhile, SS-OPTICS is more easier to control the cluster 
size by defining the value of S. Finally, we iteratively cluster 
several random samples of SCD, containing 20000 entries in 
each sample, and identify 33 clusters for the next stage to 
classify the whole dataset. 

According to the transaction time distribution of the 33 
clusters, they can obviously be classified into 4 big categories 
as Figure shows. The 4 categories can be described as: 
one-day trips, two-days trips, multi-days trips, and commut¬ 
ing trips. The one-day trips containing 7 clusters (9-15) are 
distributed in one day of the week from Monday to Sunday. 
The transaction time of two-day trips ( cluster 1-8, 16 and 18- 
23) is distributed mainly in two days of the week, while the 
transaction time of multi-day trips (cluster 24-27, 29 and 31-33 
) is dispersed in different days (at least 3 days). The commuting 


trips (clusters 17, 28 and 30) are mainly charactered with 
regular transaction time distributed in weekdays. 


3) Classifying the Whole SCDs: The 33 clusters acquired 
by SS-OPTICS are described as C = [C^i,..., C 33 ]. Each 
Ci in C is a one-dimensional vector containing 168 com¬ 
ponents. Each component (cj) of Ci is the incidence rate of 
passenger’s smart card transaction in the (jf%24)th hour of the 
0 ~ j%24/7)th day of the week. We also add a cluster to C 
as the 34th cluster, whose components are all zero, to classify 
some noise points. Thus, we can classify all the SCDs based 
on the clusters’ feature, C = [Qj] 34 xi 68 - 

According to the data we already known: 1) cluster number 
k and 2 ) feature of each cluster Ci, it is very suitable for us 
to utilize Kmeans-like algorithm to classify the whole dataset, 
since the nodus of Kmeans is to fix k and the centroid of 
each cluster. For each SCD vector, V = - ^ '^iQi\vieN^ it 

belongs to Cluster Cp 


168 


i = argmax Vj x Cij 
i=i 


Then, we update the cluster Ci, Cij= 
n X Cij -h Vj 


n + 


n X Ci. 


, if Vj ^ 0 


, otherwise 


n+\\V\\o 

Here, n is the total number of transactions in Ci. 


(4) 


IV. Mobility and Stability Analysis 

Mobility and stability patterns of people living in 
metropolitan areas are really hard to measure due to the 
huge number of residents and incomplete methods to probe 


























































TABLE 11. 


Transition Matrix of Extreme Travelers 


^^14 

EB 

NO 

TI 

RI 

NE 

SUM 

EB 

1286 

206 

535 

82 

7605 

9714 

NO 

299 

2550 

2200 

153 

30006 

35208 

TI 

376 

996 

9488 

182 

48406 

59448 

RI 

93 

198 

677 

275 

7351 

8594 

NE 

8780 

26357 

82630 

3977 

1646118 

1767862 

SUM 

10834 

30307 

95530 

4669 

1739486 

1 1880826 


EB: Early Birds, NO: Night Owls, TI: Tireless Itinerants, RI: 
Recurring Itinerants and NE: Non-Extreme Travelers 



■ Clusters of 2010 ■ Clusters of 2014 


Fig. 7. Amounts of cards in each of the 34 clusters in 2010 and 2014 


all the population. As many work CD, 0, ca mentioned, 
utilizing SCD collected by AFC system is a nearly ideal 
solution of this problem, since public transit is used by a 
large proportion of urban residents and AFC system can record 
their travel details. But we still need to consider the influence 
of many other factors, like age distribution, social scale, per 
capita income, type of job, city size and so on, to analyze 
and make correct decisions. Since the datasets of 2010 and 
2014 are selected according to same smart card IDs, the 
mobility and stability of fixed passengers can be reflected by 
their changes of travel pattern between the two years. In this 
section, we analyze passenger’s mobility and stability pattern 
based on temporal information combining some background 
socioeconomic factors listed in Table [Till 

A. Relation between Mobility and Stability 

Different temporal scales of data reflect facts from different 
perspectives. Weekly profiles showing short-term mobility 
depict people’s living circles, while transitions of mobility 
patterns may imply the unchangeable of lifestyle and social 
status between several years. A case study of variability of tem¬ 
poral patterns in Singapore GD shows variability of mobility 
patterns can be observed at individual and spatial aggregated 
scale, but the overall urban movements remains almost the 
same. Jiang et al. 0 cluster individuals’ daily activity patterns 
according to their usage of space and time within one year, 
and show that daily routines can be highly predictable at a 
group scale. But th e analy sis of relation between regularity and 
stability in Section [IIl-B | shows it is hard to predict passenger’s 
regularity. Thus, the predictability of passenger’s regularity is 
likely to be influenced by the time interval of prediction. 

Meanwhile, our long-term analysis about mobility and 
stability can be mutual corroborated to some extent by the 
above two works focusing on short-term analysis 1211 . (61 . As 
we can see in the fine-grained and coarse-grained comparisons 



Fig. 8. Heatmap of the 34 clusters’ transition matrix 


of non-extreme passengers’ transit profiles between 2010 and 
2014 in next sections, long-term dynamics of extreme travelers 
give us a snapshot of urban dynamics. Along with the increase 
of population and urban size in Beijing, inhabitant’s travel 
pattern changes a lot. But people’s life styles are more or 
less at a standstill. They still live in such a big city, they 
still have to use public transit systems, and they still have 
no more choice but to ride buses or metros. The passenger’s 
performance shows how difficult to ascend a higher stratum 
of the society in leaps and bounds in four years. The relation 
between mobility and stability also makes us have a better 
understanding of that people and society advance by steps not 
by leaps. 


B. Extreme Travelers Analysis 

According to the classification criteria proposed in Table |Ij 
we get the transition matrix of the 4 types of extreme travelers 
(EB, NO, TI, RI) from 2010 to 2014 in Table |n| The amounts 
of extreme travelers in 2010 (141340) and 2014 (112964) are 
both very small comparing to that of non-extreme travelers. 
In addition, 84% of the extreme travelers in 2010 converted 
into non-extreme travelers in 2014, which means the stability 
of extreme travelers’ live pattern cannot last for a long time. 

But among the 4 extreme type’s transition, we still find that 
the amounts of EB, NO and TI in 2010 converted to themselves 
in 2014 (1286, 2250 and 9488) occupy the highest rate. That 
is to say, extreme pattern is more likely to keep the original 
status other than to convert into other extreme patterns. It also 
meets the findings of our previous work (91 that most of EB, 
NO and TI are full-time workers, implying full-time worker 
will less likely change their jobs (also travel pattern) compared 
to the unemployed. The phenomenon of the transition rate of 
TI to TI (86%) greatly exceeds that of TI to EB, NO or RI 
demonstrates that people in the group of TI may have greater 
difference of work patterns compared to others. 


















TABLE III. 


Social Economics Eactors in Beijing 


Y ear 

Population 

Population Density 

Private Vehieles 

Bus Volume 

Metro Volume 

2010 

17.55 mil. 

1224 persons jkm^ 

2.97 mil. 

5.165 bil. 

1.423 bil. 

2014 

21.15 mil. 

1498 persons jkm^ 

4.25 mil. 

4.843 bil. 

3.205 bil. 



C. Non-Extreme Travelers Analysis 

As Figure illustrates, the heatmaps of non-extreme clus¬ 
ters can be classified into the 4 categories. Heatmap’s feature 
in each category is so distinctive that it seems like that these 
clusters are classified by some thresholds or a decision tree, 
which refiects the accuracy of our clustering method. After 
clustering the sample data and classifying the whole dataset, 
we get the amounts of cards in each of the 34 clusters in 
2010 and 2014, demonstrated by Figure The amounts of 
transaction time in one-day, two-day and multi-day trips do not 
vary much between 2010 and 2014. The amount of of one-day 
trips, around 60000, is a little more than that of two-day trips, 
around 50000. The amount of multi-day trips (about 30000 in 
each cluster) is the least. This phenomenon may be explained 
as: the more disperse the passenger’s trips are in one week, the 
smaller the amount of this kind of passengers is. Except for the 
workers commuting by public transit, only a small portion of 
people travel a lot with their travel time irregularly distributed 
in the week. These data also shows more people in Beijing 
choose to use public transit occasionally, mainly in one day 
or two days. It is maybe related to the huge urban size which 
leads residents will not use public transit only if they go far. 

However, almost all the towering bars in Figure [^belong to 
commuting trips, which depicts the public transit commuters 
who take a home-to-work trip every weekday morning and go 
back to home in the evening. Under scrutiny, the amount of 
passengers belonging to commuting trips nearly doubled from 
2010 to 2014. To explain this evident increase, two reasons 
should be considered. One is that public transit became more 
convenience from 2010 to 2014, since Beijing metro company 
constructed 8 more lines into 15 lines in total and the total 
metro length increased rapidly from 228 km to 465 km during 
this 4 years. The other reason is the ground transportation 
in Beijing became more congested, since the total number of 
private vehicles in Beijing increased from 2.9 to 4.3 million. 

1) Fine-grained Analysis: By acquiring the amounts of 
passengers of the 34 clusters in 2010 and 2014, we calculate 
the transition (mobility) matrix of these clusters demonstrated 
by a heatmap shown in Figure In this heatmap, the brighter 
the grid is, the more passengers belong to this grid. We can 


easily catch sight of bright parts (red, orange, yellow and white 
parts) and find these parts mainly distributed in cluster 17, 
cluster 28 and cluster 30 of both 2010 and 2014, which belong 
to the commuting trips category. 

Especially for the yellow and white grids (Ci 7 ^i 7 , 
C'i 7^285 C' 28^17 and 028 ^ 28 )^ their amounts are several 
times as large as the amounts of other grids. This refiects 
the stability of people belonging to commuting trips cate¬ 
gory, who tend to remain the same status. The four grids’ 
weekly profiles are demonstrated by heatmaps in Figure 
Although their morning-evening rush hours have a deviation 
of one hour, the stability can be reflected by the similar 
distribution of transaction time and the same time intervals 
between morning and evening rush. Their temporal profiles 
also tell us most commuting trips of passengers in Beijing are 
distributed mainly from Tuesday to Friday. It is interesting to 
see why commuting passengers use public transit on weekdays 
except Monday in the future. A possible explanation may be 
the Monday Morning Syndrome (MMS), which means some 
people feel even more tired out than on Friday after relaxation 
over the weekend. 


There are also some red and orange grids distributed in the 
one-day trips region (cluster 9-15). The heatmap shows the 
mutual transitions between one-day trip category (cluster 9- 
15) and commuting trip category (cluster 17, 28) happen a lot. 
Passengers in the group of one-day trip category are regarded 
as the ones using public transit occasionally. This transition 
shows passengers change their public transit usage patterns 
from occasional to regular on weekdays. This situation can be 
the result of many reasons, like changing job or work location, 
earning enough money to buy a car, or taking metro to work 
instead of driving. Figure [T^ shows the ratios of passengers 
who rode in a metro at least once a week in each cluster. The 
ratio in 2014 is apparently higher than that of 2010. Further, 
the ratios of commuting clusters (17, 28, and 30) reach local 
peaks in both lines of 2010 and 2014 in the figure. This means 
commuting passengers may be the most stable group who are 
most willingly to transit by metro. 


2) Coarse-grained Analysis: The transition matrix of the 
4 types of non-extreme travelers is also counted according 
to above data, in Table 1^ and give us a new perspective 













TABLE IV. 


Transition Matrix of Non-extreme Travelers 


^^14 

0 

T 

M 

C 

SUM 

0 

119270 

193290 

84311 

31864 

428735 

T 

164817 

298667 

142436 

48043 

653963 

M 

64449 

142399 

76769 

20038 

303655 

C 

36642 

79266 

47757 

11812 

175477 

SUM 

385178 

713622 

351273 

111757 

1 1561830 


O: One-day trip, T: Two-day trip, M: Multi-day Trip and C: 
Commuting Trip 


to analyze passenger’s mobility and stability. Here, people 
belonging to O and transformed into C is represented by 
To^c as a component of transition matrix. However, only 
with smart card data, we cannot prove our conjectures. To 
better understand the mobility and stability of passengers, 
we combine some socioeconomic statistics data of Beijing in 
both 2010 and 2014 (161, shown in Table [Till From 2010 to 
2014, Beijing increased 3.6 million people and the population 
density in urban area rose from 1224 to 1498 persons per 
square kilometer. Along with the growth of population, the 
total number of private vehicles in Beijing expanded to 4.25 
million from 2.97 million. All these factors tell us a fact 
that Beijing became more crowed in urban area and more 
vehicles led more congested ground transportation in 2014. 
As for the transition matrix, the ratios of components in each 
row of the transition matrix are very close (approximately 
0:T:M:C=6:14:7:2), implying the overall travel patterns of 
passengers in Beijing did not change much from 2010 to 2014. 
Although population and vehicles increased a lot in Beijing, 
data reveals the stability of travel pattern of public transit 
riders. However, as table ||n] indicates, the total volume of 
Beijing Metro System doubled during the 4 years, while the 
volume of bus system decreased. This unusual decline can be 
explained that the government focused on pushing forward the 
expansion of the Beijing Metro System to mitigate congestion 
brought by the fast increasing population and vehicles. But 
the transition matrix uncover that the increased metro lines 
cannot fix the root cause of this problem. In conclusion, like the 
findings in (211 and (TtII . the coarse-grainded analysis shows 
the mobility of a part of residents may change, but the overall 
travel patterns of passengers nearly stay the same. 

V. Conclusions 

Smart card data gives us another opportunity to observe the 
operation of our cities, where moving is the perpetual normal. 
In this paper, we analyze the relation between passenger’s 
regularity and stability. For the non-extreme travelers, we 
cluster them by utilizing SS-OPTICS, proposed by ourselves, 
to discover their transition patterns between different clusters. 
By combining socioeconomic data, we present several analyses 
about resident’s temporal mobility and stability. Extreme trav¬ 
elers are most vulnerable that the stability of extreme travelers’ 
life pattern cannot last for a long time. According to clustering 
outcomes and our analyses, non-extreme travelers show their 
high mobility by a lot transition between different fine-grained 
clusters. However, the stability of their travel patterns is also 
obvious when coarse-grained classification is introduced to our 
analysis. Along with the increase of population and vehicles 
in Beijing, although the government constructed more metro 


lines to mitigate congestion, it cannot solve this problem and 
the overall public transit riders nearly keep the same travel 
patterns. But at individual level, a passengers’ transit trip is 
hard to predicted based on short-term travel behavior, like 
predicting a passenger’s regularity in 2014 based on that in 
2010. Further, the prediction will be more difficult for extreme 
travelers. 

Several improvements can be made based on the work 
presented herein. Firstly, the accuracy of SCD can be enhanced 
in the future by adopting robust methods to mitigate the devi¬ 
ations of boarding and alighting time. Secondly, the proposed 
SS-OPTICS algorithm can be improved aiming to find a better 
way to define the boundaries of clusters. Thirdly, more fine¬ 
grained socioeconomic data can be introduced to our analysis. 
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